Project Group - 17¶

Members: Can Balkose, Zep Van Boxtel, Stan Vos, Julia Michels

Student numbers: 6068383 , 4903684 , 4725603 , 4996569

Research Objective¶

Requires data modeling and quantitative research in Transport, Infrastructure & Logistics

Research Question:

Effect of COVID on the transportation usage and mode of choice on different regions and demographics in the Netherlands.

Objectives

-To analyze and visualize the impact of the COVID-19 pandemic on transportation usage and mode choice in different regions within the Netherlands.

-What were the key demographic factors influencing transportation mode choice during the pandemic?

-Understanding how urban and rural cities were affected differently from the pandemic on transportation usage and mode of transportation

-To understand the change of behavior in different demographics on transportation after the pandemic. Coming up with a conclusion on the potential long-term impacts on transportation behavior post-pandemic

Contribution Statement¶

Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling

Author 1:

Author 2:

Author 3:

Data Used¶

In [1]:
import pandas as pd
from scipy.signal import find_peaks
from scipy.signal import argrelextrema
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
In [2]:
#data of distance covered in regions based on urbanization of region and mode of transport
data_distance_mode_urban= 'data/Distance_covered_on_different_urban_areas.csv'
df_urbanization_mode_urban = pd.read_csv(data_distance_mode_urban)
df_urbanization_mode_urban
Out[2]:
Year Region Mode of Transport Total Distance (billion km)
0 2018 Extremely urbanized Combined 46.8
1 2019 Extremely urbanized Combined 46.2
2 2020 Extremely urbanized Combined 31.4
3 2021 Extremely urbanized Combined 36.3
4 2022 Extremely urbanized Combined 40.9
... ... ... ... ...
195 2018 Not urbanized Other 2.0
196 2019 Not urbanized Other 2.1
197 2020 Not urbanized Other 1.5
198 2021 Not urbanized Other 1.4
199 2022 Not urbanized Other 1.3

200 rows × 4 columns

In [3]:
#data of usage of public transportation in different demographics
data_usage_of_public_transport= 'data/Usage_of_public_transportation.csv'
df_usage_of_public_transport = pd.read_csv(data_usage_of_public_transport)
df_usage_of_public_transport
Out[3]:
Demographic Year Usage of public transportation (%)
0 Age: 12 to 17 years 2018 11.7
1 Age: 12 to 17 years 2019 10.7
2 Age: 12 to 17 years 2020 6.3
3 Age: 12 to 17 years 2021 6.3
4 Age: 12 to 17 years 2022 9.2
... ... ... ...
100 No driver's license; 17 years or older 2018 17.5
101 No driver's license; 17 years or older 2019 16.3
102 No driver's license; 17 years or older 2020 8.5
103 No driver's license; 17 years or older 2021 9.8
104 No driver's license; 17 years or older 2022 13.7

105 rows × 3 columns

In [4]:
#the amount of traffix on dutch highway on weekdays and weekends compared to 2019 (2019 = 100)
data_traffic_highways = 'data/CBS Dutch highway traffic.csv'
df_data_traffic_highways = pd.read_csv(data_traffic_highways)
df_data_traffic_highways = df_data_traffic_highways.iloc[:-3]
df_data_traffic_highways 
Out[4]:
Week Doordeweeks, 2020 (2019 = 100) In het weekeinde, 2020 (2019 = 100) Doordeweeks, 2021 (2019 = 100) In het weekeinde, 2021 (2019 = 100) Doordeweeks, 2022 (2019 = 100) In het weekeinde, 2022 (2019 = 100) Doordeweeks, 2023 (2019 = 100) In het weekeinde, 2023 (2019 = 100)
0 1 83.0 101.0 71.0 67.0 96.0 82.0 103.0 99.0
1 2 99.0 102.0 79.0 64.0 86.0 86.0 93.0 98.0
2 3 100.0 102.0 77.0 65.0 85.0 84.0 91.0 95.0
3 4 104.0 106.0 78.0 67.0 88.0 91.0 98.0 103.0
4 5 102.0 103.0 78.0 48.0 87.0 86.0 95.0 100.0
5 6 99.0 88.0 62.0 61.0 87.0 91.0 92.0 99.0
6 7 97.0 90.0 73.0 68.0 82.0 82.0 93.0 85.0
7 8 99.0 87.0 80.0 68.0 88.0 89.0 92.0 95.0
8 9 94.0 105.0 78.0 74.0 86.0 92.0 91.0 104.0
9 10 98.0 99.0 80.0 63.0 88.0 89.0 93.0 96.0
10 11 91.0 67.0 80.0 71.0 88.0 91.0 94.0 99.0
11 12 60.0 38.0 80.0 68.0 89.0 92.0 93.0 91.0
12 13 51.0 33.0 80.0 67.0 87.0 85.0 92.0 93.0
13 14 52.0 33.0 77.0 60.0 88.0 84.0 95.0 88.0
14 15 52.0 35.0 76.0 65.0 90.0 87.0 89.0 94.0
15 16 47.0 39.0 76.0 67.0 86.0 92.0 89.0 92.0
16 17 58.0 53.0 75.0 85.0 84.0 107.0 85.0 113.0
17 18 56.0 49.0 81.0 80.0 89.0 103.0 91.0 99.0
18 19 61.0 57.0 77.0 75.0 93.0 93.0 94.0 96.0
19 20 66.0 57.0 83.0 74.0 89.0 91.0 87.0 100.0
20 21 64.0 61.0 78.0 80.0 86.0 97.0 93.0 96.0
21 22 78.0 62.0 89.0 75.0 98.0 83.0 97.0 87.0
22 23 73.0 70.0 84.0 88.0 88.0 98.0 93.0 104.0
23 24 81.0 73.0 87.0 84.0 94.0 95.0 94.0 96.0
24 25 81.0 82.0 86.0 84.0 90.0 92.0 91.0 97.0
25 26 84.0 81.0 87.0 88.0 91.0 93.0 91.0 93.0
26 27 86.0 84.0 88.0 90.0 90.0 93.0 91.0 95.0
27 28 86.0 89.0 86.0 87.0 91.0 94.0 93.0 95.0
28 29 89.0 95.0 88.0 88.0 89.0 94.0 92.0 95.0
29 30 93.0 95.0 89.0 88.0 93.0 97.0 NaN NaN
30 31 93.0 91.0 88.0 87.0 92.0 93.0 NaN NaN
31 32 91.0 90.0 90.0 94.0 91.0 93.0 NaN NaN
32 33 88.0 91.0 90.0 96.0 90.0 98.0 NaN NaN
33 34 90.0 84.0 91.0 87.0 92.0 92.0 NaN NaN
34 35 90.0 86.0 91.0 90.0 94.0 91.0 NaN NaN
35 36 92.0 92.0 94.0 90.0 94.0 93.0 NaN NaN
36 37 90.0 92.0 92.0 96.0 92.0 89.0 NaN NaN
37 38 92.0 89.0 94.0 93.0 91.0 91.0 NaN NaN
38 39 89.0 74.0 93.0 93.0 91.0 80.0 NaN NaN
39 40 88.0 75.0 96.0 97.0 96.0 94.0 NaN NaN
40 41 82.0 76.0 92.0 98.0 91.0 93.0 NaN NaN
41 42 81.0 70.0 93.0 98.0 91.0 94.0 NaN NaN
42 43 79.0 68.0 92.0 90.0 94.0 97.0 NaN NaN
43 44 77.0 66.0 90.0 85.0 91.0 86.0 NaN NaN
44 45 78.0 68.0 90.0 83.0 93.0 94.0 NaN NaN
45 46 79.0 69.0 85.0 83.0 94.0 96.0 NaN NaN
46 47 81.0 72.0 86.0 80.0 93.0 97.0 NaN NaN
47 48 82.0 75.0 81.0 76.0 91.0 88.0 NaN NaN
48 49 83.0 73.0 85.0 80.0 92.0 95.0 NaN NaN
49 50 82.0 72.0 85.0 79.0 92.0 90.0 NaN NaN
50 51 76.0 63.0 78.0 72.0 87.0 79.0 NaN NaN
51 52 83.0 59.0 89.0 61.0 105.0 96.0 NaN NaN
52 53 86.0 65.0 NaN NaN NaN NaN NaN NaN
In [5]:
data_2018_2022 = 'data/2018_2022.csv'
df_data_2018_2022 = pd.read_csv(data_2018_2022)
df_data_2018_2022
Out[5]:
Year Demographic Urbanization Trips Travel distance Travel time
0 2018 Age: 18 to 24 years Extremely urbanised 1018 13747 530.4
1 2018 Age: 25 to 34 years Extremely urbanised 1020 15660 536.8
2 2018 Age: 35 to 49 years Extremely urbanised 1105 13138 492.2
3 2018 Age: 50 to 64 years Extremely urbanised 990 12828 480.1
4 2018 Age: 65 to 74 years Extremely urbanised 864 9695 443.8
... ... ... ... ... ... ...
433 2022 No driver's license, 17 years or older Extremely urbanised 706 6068 400.2
434 2022 No driver's license, 17 years or older Strongly urbanised 753 7465 391.6
435 2022 No driver's license, 17 years or older Moderately urbanised 724 6877 377.2
436 2022 No driver's license, 17 years or older Hardly urbanised 702 7676 349.6
437 2022 No driver's license, 17 years or older Not urbanised 646 7450 350.2

438 rows × 6 columns

Data Pipeline¶

In [6]:
#filter out the rows where mode of transport is 'combined'
filtered_df_urbanization_mode_urban = df_urbanization_mode_urban[df_urbanization_mode_urban["Mode of Transport"] != 'Combined']

The pie charts representing distance covered by mode of transport in the Netherlands from 2018 to 2022 likely show a decline in public transport usage during the time of the pandemic. While public transport usage did recover somewhat, it did not return to levels of the pre-pandemic. This suggests that the fear and uncertainty surrounding the virus had an impact on public transport usage.

In [7]:
#pie chart to visualise the distance covered by mode of transport per year
years_to_visualize = [2018, 2019, 2020, 2021, 2022]
for year in years_to_visualize:
    df_year = filtered_df_urbanization_mode_urban[filtered_df_urbanization_mode_urban['Year'] == year]

    mode_distance = df_year.groupby('Mode of Transport')['Total Distance (billion km)'].sum()

    plt.figure(figsize=(3, 3))
    plt.pie(mode_distance, labels=mode_distance.index, autopct='%1.1f%%', startangle=140)
    plt.title(f'Distance Covered by Mode of Transport in {year}')
    plt.axis('equal')  
    plt.show()
In [8]:
# For the data for the Equivalised income groups

filtered_income_df_usage_of_public_transport = df_usage_of_public_transport[df_usage_of_public_transport['Demographic'].str.contains('Equ')]

Before the pandemic, public transport was a choice for many higher income individuals in the Netherlands. With the pandemic's start, public transport saw a significant drop in usage, possibly due to health concerns linked to crowded spaces. This decline affected society broadly, but higher income groups experienced a more substantial decrease in their use of public transport as seen in the animation.

Even after the affects of the pandemic dropped in 2022, it could be observed that the usage of public transport in higher income groups has still not recovered as much as the lower income groups.

In [9]:
fig = px.bar(
    filtered_income_df_usage_of_public_transport,
    x="Demographic",
    y="Usage of public transportation (%)",
    color='Demographic',  
    animation_frame="Year",
    range_y=[0, 20],
    title="Usage of Public Transportation Over Years",
    labels={"Usage of public transportation (%)": "Usage (%)"},

)
fig.update_xaxes(categoryorder='total descending')

fig.show()

Another interesting demographic to look at was age groups. As seen in the animation below, younger age groups usage of public transport have recovered much better than older age groups who could endure the virus more severely. It is also seen that younger age groups, who are expected have lower incomes, tend to use public transport more than older age groups, which is correlated to the animation above.

In [21]:
# For the data for age

filtered_age_df_usage_of_public_transport = df_usage_of_public_transport[df_usage_of_public_transport['Demographic'].str.contains('Age')]
fig = px.bar(
    filtered_age_df_usage_of_public_transport,
    x="Demographic",
    y="Usage of public transportation (%)",
    color='Demographic',  
    animation_frame="Year",
    range_y=[0, 20],
    title="Usage of Public Transportation Over Years",
    labels={"Usage of public transportation (%)": "Usage (%)"},

)
fig.update_xaxes(categoryorder='total descending')

fig.show()

The graph below shows the usage of public transportation over years by driver license and car ownership. The study shows people with no drivers licenses had the largest bounce back to using public transporation again, which indicates that people with driving licenses still refrain from the usage of public transport post-pandemic.

In [10]:
filtered_driver_license_df_usage_of_public_transport = df_usage_of_public_transport[df_usage_of_public_transport['Demographic'].str.contains('river')]

fig = px.line(
    filtered_driver_license_df_usage_of_public_transport,
    x="Year",
    y="Usage of public transportation (%)",
    color="Demographic",
    title="Usage of Public Transportation Over Years by Driver License and Car Ownership",
    labels={"Usage of public transportation (%)": "Usage (%)"},
    markers=True
)
desired_years = [2018, 2019, 2020, 2021, 2022]
years = [str(year) for year in desired_years]

fig.update_xaxes(tickvals=years,ticktext=years)

fig.show()

The graphs depicting distance covered in different urban areas of the Netherlands from 2018 to 2022 demonstrate a clear and consistent decline in transportation usage. The level of urbanization did not appear to have a significant correlation with the decline in transportation usage, suggesting that individual caractheristics of people played a more critical role in shaping mobility patterns rather than location.

In [12]:
df = pd.DataFrame(df_urbanization_mode_urban)

sns.set_style("whitegrid")

plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x="Year", y="Total Distance (billion km)", hue="Region", marker="o", palette=sns.color_palette("hsv", len(df['Region'].unique())), errorbar=None)

plt.title("Effects of COVID-19 on Distance Traveled in Different Regions")
plt.xlabel("Year")
plt.ylabel("Total Distance (billion km)")
plt.legend(title="Region", loc='center left', bbox_to_anchor=(1, 0.5)) 
plt.grid(True)
plt.show()
In [19]:
sns.set_style("whitegrid")

plt.figure(figsize=(12, 6))
sns.lineplot(data=df_data_2018_2022, x="Year", y="Travel time", style="Urbanization", markers=True, dashes=False,errorbar=None)


plt.title("Travel Time per Region")
plt.xlabel("Year")
plt.ylabel("Travel time")
plt.legend(title="Demographic", loc='center left', bbox_to_anchor=(1, 0.5))  
plt.grid(True)
plt.show()
In [20]:
sns.set_style("whitegrid")

plt.figure(figsize=(12, 6))

for column in df_data_traffic_highways.columns[1:]:
    sns.lineplot(x="Week", y=column, data=df_data_traffic_highways, label=column)

plt.legend(loc="upper right")
plt.xlabel("Week")
plt.ylabel("Value (2019 = 100)")
plt.title("Traffic Data Over Weeks")

plt.show()
In [ ]: